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PREFACE 

The function of a phonemic system is to distinguish the utterances 
of a language. One concept that has appeared in certain theories of 
linguistic change is that some contrasts between the phonemes of a 
language do more work than others. This Memorandum suggests and 
discusses criteria for the quantification of this concept for three 
possible cases. It should be of interest to theoreticians and 
investigators in linguistics. 

The author. Professor of Linguistics at Cornell University, is 
a consultant to The RAND Corporation. 
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SUMMARY 



Measures of the linguistic load carried by a contrast are de- 
veloped for three cases, in which the contrasts are taken to be, re- 
spectively, phonemic, allophonlc, and component lal. The load carried 
by a contrast is non-negative and zero if the "contrasted'^ units are 
identical, or if neither occurs in any environment in which the other 
is found. The measure proposed is the change in entropy of the sys- 
tem if the contrasted phonemes are coalesced; some problems peculiar 
to the allophonlc case are discussed. If each distinct bundle of com- 
ponents is an allophone, the entropy of a given system is independent 
of point of view (phonemic, allophonlc, or component ia 1) . 
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IHE QUANTIFICATION OF FUNCTIONAL LOAD 

1. INTRODUCTION 

Of the many problems in linguistics on which the work of A. 

Martinet has shed light, one of the most interesting is the notion 
of fimctional load (or yield or burden) . ^ In simplest terms, the 
notion is this* The function of a phonemic system is to keep the 
utterances of a language apart. Some contrasts between the phonemes 
in a system apparently do more of this job than others. For instance, 
in English there are hundreds of pairs of words that differ only in 
that one has /p/ where the other has /b/ ( pat : bat, nipple : nibble , 
ca£ : ca^ , but only a very few are kept apart by /%/ versus /z/ (for 
some speakers mesher : measure ; for some Asher : azure ; for some 
Aleutian ; allusion) . Presumably, then, the contrast between /p/ and 
/b/ does more work even in complete utterances than does that between 
fsi and /z/. At least, it is easier to coin a pair of whole utterances 
such as Don’t take that cap : Don't take that cab than it is to find 

one for /s/ and /z7, simply because there are more minimally different 
words of the first type* 

Martinet's concern with functional load has been with its possible 
relevance in linguistic change. Suppose, for example, that in a parti- 
cular community the random drift of sound change^ threatens to wipe 
out a contrast that carries a certain functional load. If that load 
is sufficiently high, is it possible that exigencies of communication 
would prevent the impending coalescence? How high must the load be 
for this effect? Or, indeed, are adjustments by paraphrase always made, 
so that the coalescence is free to proceed without impairing communication? 



Discussed in various essays, most of them included in Andr^ 

Martinet, Economie des Changements Phonetiques . Berne, 1955. Martinet 
cites various European predecessors, but I have not consulted their 
works. 

2 

"Random" is a difficult word; in particular, we are discussing 
in this very paper a kind of factor that perhaps militates against 
completely random randomness. However, I do find it necessary to 
accept (as many contemporary linguists do not) the neogrammarian hypothesis 
of "regularity", in a certain modernized version, for which see Sec. 3.2 
of this paper and my "Sound Change", Language . Vol. 41, No. 2, pp. 185- 
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We can imagine the language of some community undergoing a 
series of sound shifts that obliterate all distinctions and reduce 
all utterances to the same dull blur. But we can be quite sure that, 
if anything like this has ever in fact happened, it happened long ago 
in the very earliest stages of human evolution, and the communities 
in question ceased to be viable and left no mark on subsequent history. 
For all the languages of today, and for all known to us via written 
records or the comparative method, we can assuredly assert that a 
certain minimal fluency is always maintained . If contrasts carrying 
a certain functional load are lost, new contrasts develop to take 
over the load, or some of the contrasts not lost assiuae an additional 
share. 

This does not help us very much, because we do not know what the 
required "minimal fluency" is — nor do we even know how to express such 
a "minimal fluency" in quantitative terns. 

Another possible approach is to observe actual instances of 
lost contrasts. For example, almost all varieties of American English 
have lost the contrast between ft! and /d/ after a stressed vowel 
before an unstressed vowel, so that such pairs as matter and madder . 
latter and ladder , sweetish and Swedish have become completely homo- 
phonous. True enough, most of us Americans can resort, in an emergency, 
to an artificial spelling pronunciation that restores the distinction; 
but most of the time we don't. To a speaker of British English, this 
particular coalescence is one of the most striking features of the 
"slurred" speech of Americans. Yet iimerican English is clearly viable 
without the contrast. Now, if we could meaningfully quantify the 
functional load carried by this particular contrast before it was 
lost, we would know, at least, that that much load is not enough to 
prevent a coalescence — because, in fact, it didn't. 

The present paper has limited alms. X shall not express 
opinions of Martinet’s various suggestions about functional load. 

I believe he has carried the matter as far as it can be carried without 
actual quantification. His hunches are incisive and suggestive, and 
perhaps in part wrong; but they cannot be confirmed or disproved merely 
by someone else's hunches. The next step in this area of investigation 
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must be the development of quantitative methods. That is what will 

3 

be undertaken here- -but only in an abstract way; I have done no counting. 

We shall consider three cases: functional load in terms of (1) 
phonemic contrasts, (2) allophonic contrasts, and (3) componential 
contrasts. There is currently a very active debate as to the relative 
importance of these three different sortc of units in phonology. 

Although I take certain positions in this debate, I do not want to 
import them into the present paper. By dealing with all three cases, 
we can supply the requisite formal tools for quantification regardless 
of how the debate is eventually resolved. 

2. CASE 1 — PHONEMES 

2.1. Algebra 

2.1.1. Let L— be a phonemic system with m phonemes /!/, /2/,... j’ 

” 4 

/m/. In the terminology of algebraic grammar (which uses some words 
familiar from ordinary linguistics, but in potentially deceptive 
special senses) , these m phonemes are the characters of a linear 
alphabet . This means that: (1) m is finite; (2) the characters can 
be anything at all, as long as they are pairwise distinguishable ; 
and (3) every utterance of the language of idiich L~ is the phonemic 
system consists, without residue, of a string of occurrences of 
characters of L— . (On the other hand, of course not every string 
of occurrences of characters is necessarily an utterance of the language). 



3 

An earlier and briefer effort of mine to quantify functional 
load will be found in ny Manual of Phonology , Indiana University 
Publications in Anthropology and Linguistics, No. 11, 1955. This 
earlier effort was vitiated by a mathematical error, which will be 
pointed out below. 

See my "Language, Mathematics, and Linguistics", to appear in 
Current Trends in Linguistics . Vol. 3, 1966. The elements of a set 
may conveniently be called "characters" merely if they are pairwise 
distinguishable. This may seem redundant, but it is not: in some 
sets that must be discussed mathematically, the elements are not 
distinguishable. For example, one can tell the difference between 
an electron and a proton, or between one electron and two electrons, 
but not between one electron and another electron. 



er|c 



rsfi s rp ^ f fk, 



-4- 



In Sec. 2 we ignore any variations in actual physical properties of a 
character from one occurrence to another; in Sec. 3 we shall pay syste- 
matic attention to such variations. Also, in Sec. 2 we ignore any partial 
resemblances between characters (such as the feature of bilabiality 
common to English /p/ and /b/) ; this is underscored by the inclusion . 
of "linear" above. In Sec. 4 we shall deal with such resemblances. 

For our first step, we forget (for the moment) that the elements 
of are phonemes, and take L— merely as a finite set of characters 
(that is, of pairwise distinguishable elements). Let us consider the 
system whose elements are all the partitions of the characters of 
Ip-, We can illustrate what is meant by a "partition" by assuming some 
small value for m, say m = 4. Then each of the following lines displays 
one of the possible partitions of the four characters; we label them for 
subsequent cross-reference: 

l\ /!/ /2/ /3/ /4/ 

/12/ /3/ /4/ 
l|. /13/ fZf /4/ 

ij. /14/ /2/ /3/ 

l|. /23/ /!/ /4/ 

l|. /24/ 111 131 

/34/ 111 111 
lJ. 11131 /4/ 

/12//34/ 
ij. /124/ 131 

l|. 1131 llkl 

l|. /134/ m 

l|. /14/ /23/ 

l|.' /234/ /!/ 

L^. /1234/ 

We see that a partition is an assignment of all the characters to classes, 
where each character is assigned to some class, and none is assigned to 
more than one. That is, /12//3/ is not a partition of our four charac- 
ters because one has been left out; and /12/ /13/ /4/ is not a partition 
because one has been assigned twice* The total number of partitions 
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^ of a set of m characters is a function of m. For m = 2 there are just 

two partitions, for ro = 3, there are 5; for jn = 4, as shown, above, 
there are 15; for m = 5, there are 52. For still higher values of i 

the number of partitions increases very rapidly. The fifteen partitions 
listed above are the elements of the system 
j Suppose that L and L' are two of the partitions of a system S~; 

and suppose, further, that we can (so to speak) change L to L* by 
"coalescing” one or more of the classes of L into a single class of L*. 

This means that, if two characters x and belong to different classes 
of L, they may or may not belong to the same class of L' ; but if x 
and y; belong to the same class of L, they must also belong to the same 
class of L'. If this relation holds between a particular pair of 
I partitions L and L* , we say that L = L ' . For example, in S^, whose - 

J elements are listed above, L? = 1 ^, Le = if, and even L? = lJ (indeed, ’ 

if L is any partition, then L = L) ; but, clearly, lJ ^ L 2 and if ^ 

Figure 1 displays the system graphically. The nodes represent j 

the fifteen partitions , and are appropriately labelled. If, given two 
ti distinct partitions L and L' , it is the case that L = L * , then, in the 

It is possible to pass from L to L' along one or more connecting 
> lines, moving generally from left to right (perhaps slanting upwards or 

1 downwards, but never backing up from right to left). 

j ^ ^ in 

The system of all partitions of a set L— • of m characters, with 
I the relation = defined as we have defined it, is known to be an exempli- 

fication of a formal mathematical system called a relatively complemented 
semimodular lattice (or matroid lattice) .^ Any property shared by all ' 
matroid lattices will, of course, hold for any system even idien we 

I ^ I'i 

put a different interpretation on our symbols. For the application we 
I have in view, most of these properties are quite irrelevant. But we do 

■ need to note the following, all of which can easily be read from Fig. 1 

for the specific example displayed there: 



^Garrett Birkhoff, Lattice Theory . 2nd Ed. 1949, p. 107. 
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Fig, 1— The system S'* 
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(1) A system' S- includes a unique universal upper bound L— such 

that,* if L is any partition of , then L— - L, and a unique universal 

lower bound such that, if L is any partition of Sr~ , then L — L • 

In Fig. 1, these unique elements appear as the leftmost and rightmost 
4 1\ 

nodes (L and L ). 

(2) Given any element L other than the universal upper bound, it 

is possible to find an element L' such that L' - L but such that there 

is no element L" for which L' = L; we shall then say that L'-^L, 

Similarly, given any element L other than the universal lower bound, it 

is possible to find an element L' such that L - L* but such that there 

is no element L" for which L ^ L" ^ L ' ; we shall say that . 

“ ml 

(3) A chain from L~ to L is any set of elements —2’***’"^ 

of S- such that L- = L-_>L Clearly (see Fig. 1) the 

number of elements in any chain in SH is just m. Two chains in S_— 
are distinct if one of them contains at least one element of S- not in 
the other. The number of distinct chains in ^ is 



m(m— l)^(m —2)^, .,2^,1 m! (m — 1)! 



oS " i 
2 — 






2.1.2. Now we return to the interpretation of L as a phonemic 
system. We shall imagine that we can operate on L— in the following 
way to produce a new phonemic system: we select any two phonemes /i/ 

and /j/ of L" and agree to ignore the difference between them, so 
that the new system will not contain either /i/ or /j^/ but only a new 
phoneme /ij/; otherwise the new system is just like the old one. This 
finds its diachronic analog in the coalescence of two phonemes of an 
earlier stage of a language to form a single phoneme at a later stage. 
Note that our notation "/ij,/” does not represent a string of two pho- 
nemes, as it would in ordinary linguistic usage. When we need to repre- 
sent such a string, we will insert commas: • 

With this interpretation, the system S- consists of a basic 
phonemic system, L- , plus just all those other phonemic systems that 
can be obtained from L- by one or more pairwise coalescences of the 
kind just described. Thus, in Fig. 1, the phonemes of the basic system 
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are /!/, /2/, /3/, and /4/; if we now ignore the distinction between 
/!/ and 111 , we get the system with phonemes /12/, /3/, and /4/, 
and so on. This change of interpretation does not in any way alter 
the fact th^t is a matroid lattice, with all the formal properties 
of any such syltem. Of course, all of this is a matter of mathematical 
convenience; in particular, obviously could not be a real phonemic 
system, and even m = 4 is too low a value for a real one. But our 
application of the formal apparatus will be such that these departures 

from reality ^o not matter. 

2.2. Measure 

2.2.1. The notion of functional load is that a phonemic system 
L- has a (quantifiable) job to do, and that the contrast between 
any two phonemes, say /a/ and /b/, carries its share. There is only 
one way in which the contrast between /a/ and /b/ can stop doing its 
share of the work: t'hat is for /a/ and /b/ to coalesce, yielding a 

new phoneme /ab/ and hence a new system Lj It makes sense to infer 
that if the contribution of the contrast between /a/ and /b/ is thus 
withdrawn, one of two things must happen: (1) the job done by the 
whole system is rendered smaller; or (2) the total job remains the 
same, and the share no longer carried by the lost contrast is somehow 
divided up among the contrasts that remain. We shall first explore 
alternative (1), returning to (2) below in Sec. 2.2.7. 

2.2.2. We assume that the load of work done by a whole phonemic 
system can be expressed in the form of a nonnegative real number (a 
"negative" load seems not to make sense). Let ^(L) be the load carried 
by a system L, and let f (/a/, /b/) be the share carried by the contrast 
between /a/ and /b/. Then, under alternative (1), we have 

or, transposing, 

f(/a/,/b/) = — Klif )• 

The equation is more useful in this second form, because it suggests 
that if we can find an appropriate measure of the functional loads of 
whole systems of S-, then a suitable and suitably related measure of 
the functional loads of individual contrasts is immediately at hand. 
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A desirable property for is that - 0. The reason is that 

a system with no contrasts can carry no information; as we have already 
said, is only a mathematical convenience, not by any stretch of the 
imagination a phonemic system. Similarly, for any phoneme /x/ we must 
have f(/x/,/x/) = 0: for if we ''operate” on a system by agreeing to 
Ignore the difference between a phoneme and Itself, we have not changed 
the system at all. 

Another desirable property is that the contribution of any contrast 
should be at least zero: that is, if — 2* 

The justification for allowing a zero load, but not a negative one, 
requires some discussion of phonological theory. 

Two phonemes may have nonintersecting distributions, in the sense 
that neither occurs in any environment in which the other is found. 

By one possible phonemic! zat ion of English, this is true of /h/ and 
/tj/. But if English /h/ and /ij/ have this distribution, then no pair 
of utterances can differ only in that one has /h/ where the other has 
/“Tj/. Consequently, a coalescence of /h/ and /tj/ (however difficult 
to imagine phonetically) would destroy no contrasts of whole utterances; 
the total load carried by the system would be undiminished. Therefore 

f(/h/,/y) = 0. 

If two phonemes are not in nonintersecting distribution, however, 
then there must exist at least one environment in which both occur. 

Now, for two allophones to be phonemically different, it is sufficient 
that they should be in direct contrast in a small environment. It is 
therefore possible for two phonemes to stand in contrast in small 
environments, and still not serve as the sole differentia of two whole 
words or two whole utterances. On the other hand, it is quite impossible 
for two words or two utterances to be kept apart by a single<*phoneme 
difference unless the two phonemes involved also contrast in small 
environments. Tlie inferences are as follows. Suppose we have a 
method of measuring functional load by successive approximations, 
and that the earlier approximations involve the inspection only of 
small environments. For these earlier approximations, any contrast 
between phonemes that are not in nonintersecting distribution will 
prove to carry a positive share of the total load. As the approximations 
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continue, and larger environments are taken into consideration, some 
of these shares may become vanishingly small, but none can become negative* 
We ask next if the measure ought to be additive, in the sense 
that the sum of all loads carried by all contrasts between pairs of 
phonemes in L- would be just the load carried by L-. The hasty ansr/er 
is affirmative, but wrong. We must remember the nature of a phonemic 
system. If a system L— were a set of m elements each of which indi - 
vidually made some contribution to a measure defined for the whole 
system, then additivity might be natural. We would assume, in such 
a case, that the elements of ctjuld be deleted, one by one, until 
all were gone, and that the measure would correspondingly diminish 
to zero* But a phonemic system is not composed of elements that can 
be deleted in this way, and the measure is not defined for single 
elements, only for pairs. The pairs are not independent. They cannot 
be "deleted”; they can only coalesce. If our first step is to coalesce 
/a/ and /b/ into a new phoneme /ab/, then it is no longer possible to 
perform a similar operation on any pair /a/, /x/, or /b/, /x/, since 
the phonemes /a/ and /b/ are no longer present* 

2*2*3* To summarize: we want the measure to have the follow 

ing properties, all but the last of which have now been discussed: 

(PI) f(L) * 0 for all L in S-. 

(P2) f(L^) « 0. 

(P3) If -* L 2 , then f(Lj^) » 

(P4) If ^ h.2 ii3 -1 

contain /a/ and /b/ while L 2 and contain /ab/, 

then j^(L j|(L 2 ) * ^( 113 )— 

Property PI, of course, follows from P2 and P3* 

Property P3 guarantees that the load carried by the universal upper 
bound of S— is the upper ui>und of the loads carried by the systems of 

sS. 



^This hasty answer was the error in my earlier discussion, cited 
in footnote 3* The error wat called to my attention by William S.-Y Wang 
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Property P4 guarantees that the load carried by a contrast does 
not vary depending on where ^thin we choose to measure it* 

If a measure has all four of these properties, then we can eKteiid 
it to cover any subset of the phonemes of any system L of S~ as 
follows: Let L be a system in which the phonemes /a/, /b/,*,,, /i/ 

are all distinct, and let L* differ f rom L only in that /a/, /b/,..., 

/i/ have all coalesced into a single phoneme /ab.,,i/. Then we define 

(Dl) f(/a/, /b/,...,/i/) = f(L) — (L*). 

When there are only two phonemes in the set, /a/ and /b/, then this 
reduces to the second form of the equation in Sec* 2*2*2*, and gives 
us the appropriate measure of the functional load of a single contrast. 
Further, if /a/ = /b/, then L * = L and ^(L) — £(L*) = 0, so that, as 
desired, ^(/x/, /x/) = 0 for any phoneme /x/. 

Any measure with the first three of these properties shows what 
we may call additivity along a chain . Suppose we move along any chain of 
from L— to L^* Each step involves coalescing a single pair of phonemes 
/x/ , /y/ of the predecessor into a single phoneme /xy/ of the successor. 
The sum of ^(/x/,/y/) for all pairs coalesced in passing from to 
is just ^(L^* This sum is obviously the same regardless of choice 
of chain. The individual addends need not be the same. But if the 
measure also has property P4, then the addends along any chain are a 
permutation of those along any other chain that involves just the same 
coalescences. 

2.2.4. A measure that meets the requirements proposed in Sec. 

2*2.3* is Shannon's entropy H (in blnlts per symbol).^ 

Let £ be a relative frequency (or a probability) , 0 * £ = 1. Then 
we define: 

-ii£> * •■ £ I082 ^ £ ^ 0 

SB 0 £ a* 0. 



Claude E. Shannon, "The Mathematical Theory of Communication", in 
Claude E. Shannon and Warren Weaver, The Mathematical Theory of Commu- 
nication, Urbana, 1949. — — 
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Now let £/^/ be the relative frequency of phoneme /_!/, and let 
jt(£/^/) * I/^/* Then the first-order approximation to the entropy of 
L— is defined as: 



Similarly, if £/ij^, ^ 2 *** * relative frequency of the string 

indicated, then the nth-order approximation is 



where each ranges independently from /!/ to /m/. 

If there is reason to believe that the proper limit exists, then 
we can define the entropy of L— to be 



Otherwise we can define H = for some suitably large n; this is 
discussed below in Sec. 2. 2. 6. 

2. 2.5. For any system L in , we define = H(L). This 

measure has the four properties set forth in Sec. 2.2.3. Ifost of the 
proof is simple. We need only consider property F3. 

The discussion of Sec. 2.2.2. shows that all we need demonstrate 
here is that two phonemes, both of which occur in some small environ- 
ment, cannot make a zero or negative contribution to a sufficiently 
low-order approximation to the entropy. Let £/a/ * £ and £/b/ * £ 
be the relative frequencies of /a/ and /b/ in the particular environ- 
ment. Then both £ and £ lie in the open unit interval, as does their 
sum £ + £, except that £ + £ * 1 just if /a/ and /b/ are the only phonemes 
that occur in the given environment. 

We now show that, for all possible values of £ and £ with the 
constraints just given, I(£) + I(£) > I(£4£). The proof is direct 
(rather than contrapositive): 




m 




H(k^ = lyte s„ (i^. 
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£ < £ + c[ and i £ 

log £ < log (£f£) V and log £ < log (£f-£) 

£ log £ < £ log (£f-£) and £ log £ < £ log (£t£) . 

Adding these two inequalities, we get 

£ log £ + £ log £ < <£+£) log (£+£> 

or 

- £ log £ - £ log £ > - (£F£) log (£+£> 

which, by the definition of is the proposition to be proved* 

2* 2* 6. Althou^ the mathematically most tempting definition of H 
involves a limiting process, this raises the question as to whether the 
desired limit exists. In one practical sense, this does not matter: 
any actual computations of functional loads based on our formalism are 
going to settle for relatively low-order approximations. 

But there is also an empirical reason why we should perhaps not 
worry about the limit even if, mathematically, it does exist. People 
do not speak in indefinitely long utterances. Furthermore, truly long 
utterances (such as a political harangue or a university lecture) are 
broken into successive segments each of which has some sort of unity 
and cohesiveness about it. In some languages, words (when properly 
defined) have such unity and cohesiveness; in others, phonemic phrases 
of some sort do. It may perhaps be suggested that an appropriate 
definition of H is H = Hj^, where k is the length in phoneme-occurrences 
of the longest cohesive unit of whatever type is chosen; or, perhaps, 
we should let k be the average length in phoneme-occurrences of such 
units, vdiere the averaging is based on text -frequency, not list-frequency. 

2.2.7. Alternative Measure (1). The entropy H used in Sec. 2.2.4. 
is in binits per symbol-occurrence. If the average rate of emission of 
phoneme-occurrences is r per second, then;H = ^ is the entropy measured 
in shannons (blnits per second) . 

A way to achieve alternative (2) of Sec. 2. 2.1. is to assume that 
^f H is decreased, r must increase enough to keep H unchanged. That is, 

H becomes a constant for any basic system L— and its derivatives within 

IQ 

S|~. Then H(/a/,/b/), as already defined, can be regarded as an indirect 
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measure of the increase of £ required to compensate for the loss of the 
contrast between /a/ and /b/. Let L* be the system derived from L— 
by the coalescence of /a/ and /b/; let £* be the required new rate of 
emission; and let £ (/a/,/b/) = £'/£• Then 

H(L^ H(I^ 

s/a/,/b/) = = — 

H(L') H(l 2) - H(/a/,/b/) 

It is obvious that as we pass along a chain of towards L , £ and £* 
both increase without limit. 

Since s is defined only for contrasts, we cannot test it for proper- 
ties P1-P3 of Sec. 2.2.3. unless we somehow extend it to systems as veil 
as contrasts, but there seems to be no natural way of doing this. We 
can test for property P4, and it turns out that £ does not have this 
property: in general, an "early" loss of a particular contrast (that is, 
a loss closer to L— on some chain from L— to L^) entails a smaller £ 
than a Vlate" loss of the same contrast. However, we could always 
agree to measure £ starting with L— . In any event, £ is related so simply 
to H that information about £, if wanted, requires only trivial computa- 
tion beyond that for H. 

The possible empirical significance of £ is not clear. Offhand, 
one might guess that all human languages have just about the same amount 
of work to do. No language is spoken always at the same rate, but there 
do seem to be variations from language to language as to "normal" or 
"average" rate, perhaps also as to maximum intelligible rate. If the 
guess just mentioned is valid, one might suspect that the average rate, 
or the maximum rate, is higher for a language with a relatively more 
complicated phonemic system. Impressionistically, Japanese seems to 
be spoken faster than German or Russian, and Hawaiian perhaps faster 
than Japanese. We need accurate measurements rather than impressions; 
perhaps some have been made, but I do not know about them. We do know 
that the Lord's Prayer is longer (in total number of phoneme occurrences) 
in those Chinese dialects that have less phonemic differentiation than 
in those that have more;® but this would attest to our guess only if 



®Vide Y. Rc Chao. 
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the time of delivery for the prayer were about the same for all dialects, 
and on this we have no information. 

2. 2. 8. Alternative Measure (2). A different sort of measure is 

9 

supplied by Shannon* s relative entropy C. For any system L-, C(L-) = 
H(L“) /log^g. The denominator in this expression is the entropy of a 
system with m phonemes in which all phonemes are constantly equiproba- 
ble, and is the maximum entropy achievable (neglecting channel noise) 
with m elements. Since both numerator and denominator are in binits, 

C itself is an absolute number; it is also independent of time, since 
|j[(li“) /jtlog^ = tH(L— ) /t^log^m = jC(L~) . Since for the formula reduces 
to the indeterminate form 0/0, we must specify that C(L^) = 0. 

Definition (Dl) of Sec. 2.2.3. now says that the load carried by any 
set of contrasting phonemes is the loss in relative entropy entailed 
by the loss of the contrasts. 

This measure has properties PI and P2, but not P3 or P4. For, 

consider an artificial system 1 ^ with four phonemes /a/, /b/, /c/, 

and /d/, each with constant probability %. Let Lj^ contain /ab/, /c/, 

and /d/; let contain /a/, /b/, and /cd/; and let y contain /ab/ 

and /cd/. Then C(L^) = C(L") = 1; but C(L{) = C(Lb < 1. But Li 

4 — — i Z —1 

differs from L only in that /a/ and /b/ have coalesced, and L" differs 

from only in the same way. Thus (1) the load carried by the contrast 

between /a/ and /b/ in L^ is negative, contrary to property P3j and 

(2) the load carried by the contrast between /a/ and /b/ depends on 

where in ^ it is measured, contrary to property P4. 

These facts, in ray opinion, constitute serious defects in C as 
a measure of functional load. On the other hand. Shannon has shown 
that approximations to C converge more rapidly than do those to H. 

For real phonemic systems, which are much more complicated, of course, 
than our artificial example, it may be that C affords a more easily 
computable and sufficiently accurate approximation to H; but this 
should be tested empirically. 



This is the measure used by William S.-'i. Wang and James W. 

Thatcher in "The Measurement of Functional Load," Report No. 8, 
Communication Sciences Laboratory, The University of Mibhigan, April 1962. 
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3. CASE 2 — ALLOPHONES 

3.1. For application to historical linguistics, functional load 
in terms of phonemes will not usually suffice. In the course of his- 
tory, it is in the first instance not phonemes but allophones that change 
as to physical properties, and it is certain of these allophonic changes 
that entail restructurings of the phonemic system. 

Let L— involve phonemes /!/, /2/,...,/m/; and let the allophones of 
phoneme /i/ be •••» where r^ = 1 for every i. Also, 

let: m ' — 




i=l 



r 1 m 

Then L^— is the same system as L— , but viewed as composed of allophones 
rather than phonemes. 

3o2. In the kind of linguistics that uses allophones and phonemes, 
we have the first two of the following assumptions about allophones; in 
any kind of linguistic theory we have the second two: 

(Al) A given allophone in a given environment always 
represents the same phoneme. 

(A2) If two allophones belong to the same phoneme, thej 
are in nonintersecting distribution. 

(A3) In course of time, two allophones may coalesce. 

(A4) In course of time, a single allophone may split into 
two, but only if the two new allophones are in non- 
intersecting distribution. 

Assumption Al guarantees that we can know what phoneme an allophone • 
represents without knowing anything about the grammar or semantics of 
the utterance in tdiich it occurs (separability of phonology from grammar). 
Assumption A4 is the modern form of the neogrammarian principle of 
regularity of sound change. From A2, two other facts immediately follow: 
(A2.1) If two allophones are in intersecting distribution, 
they belong to different phonemes » 

(A2. 2) If a phoneme occurs in an environment, it is repre- 
sented there always by the same allophone. 
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3.3. Suppose, now, that we examine a system L — , related to a 
given L— , but that we disregard the phonemic affiliations of the 
allophones and attempt to measure the functional load of the system 
directly in terms of allophones and their distribution. Let our 
measure be the H of Sec. 2.2.4. We have the following 

Theorem 1 . = H(L^. 

This says that the entropy of a system is the same whether we measure 
it in terms of allophones or of phonemes; also, that the entropy is 
invariant from one phonemicization to another as long as all phonemici- 
zations accord with assumptions A1 and A2 above. 

Proof. A pair of allophones contribute nothing to the load unless 
they are in contrast. If they are in contrast, then, by A2.1, they belong 
to different phonemes, and, by A2.2, each phoneme is represented by just 
this allophone in any environment in which the two allophones contrast. 
Thus the relative frequencies of the allophones, in any such environ- 
ment, are just the relative frequencies of the phonemes they represent. 
Since these relative frequencies of phonemes in environments are just 
the determinants of H(L^ , exactly the same (nonzero) relative frequen- 
cies determine ^). 

3.4, An allophonic split or coalescence can affect a phonemic 
system in various ways. In some instances, the only change is in what 
we might call the "Internal economy*' of one or more phonemes: that is, 
a phoneme gains or loses an allophone, but continues to be represented 
in the same e^iVironments as before; or an allophone switches its affil- 
iation from one phoneme to another, but without changing the number 

of phonemes and without altering the contrasts in any emvironment. 

For our purposes, any alterations of the kinds just described are 
irrelevant. An allophonic change is system- changing if and only if 
it does one of the following: (1) changes the number of phonemes in 
the system, or (2) alters the contrasts in some environment. From A4, 
an allophonic split cannot be system- changing. From the other assump- 
tions, a coalescence cannot be system- changing if the two allophones 
belong to the same phoneme before the coalescence. This leaves two 
types of coalescence that are, or may be, system- changing: 
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(1) Suppose /a/ and /b/ are not in contrast, and that [a^^] and 

[bj^l coalesce. No new contrasts are introduced, nor are any lost, so 
that the load of the system is unchanged. But if, say, [a^^] is the only 
allophone of /a/, and if the new allophone [aj^bj^] belongs to /b/, then 
the number of phonemes has been reduced by one. Unfortunately, there is 
no way of stating (in purely formal terms) whether the coalescent allo- 
phone will be assigned to /b/ or to /a/. This depends on 

"phonetic similarity", or on distribution of phonological components-- 
or, indeed, op the individual linguist's taste and prejudices. 

(2) Suppose /a/ and /b/ are in contrast, and that [a^^] and [b^^], 
the respective representatives of /a/ and /b/ in one of the environ- 
ments in which both occur, coalesce. A coalescence under these condi- 
tions is always system-changing. But the exact consequences depend 

on further factors. We need to know idiether one of the allophones, say 
[a^], is or is not the only allophone of its phoneme. And we need to 
know whether the coalescemce is compensated or uncompens at ed . 

Suppose [cj^] and allophones of /c/, occur respectively in 

environments and £ 2 * and that the sole difference between and 
E 2 that involves an allophone [x] exactly where E 2 involves a 
i different allophone, [y]. By A2. 1, then, [x] and [y] must belong to 

1 different phonemes, because they occur in identical environments-- 

j 

namely, what is left of either Ej^ or plus /c/. We can therefore 
I take [x] = [a^^] and [y] = [b^]. Now suppose that [a^^] and [b^^] coalesce. 

I This coalescence renders Ej^ and E 2 identical, so that [c^] and CC 2 ], 

j by A2. 1, must now belong to different phonemes. A coalescence of [a^^] 

and [bj^] under these conditions is compensated. Under any other condi- 
j tions, it is uncompensated. 

The effect of coalescences of type (2) on phoneme-count is thus' 

i 

t as follows: An uncompensated coalescence leaves the phoneme-count 

j unchanged if both coalescing allophones -belong to phonemes that have 

< % 

j other allophones, but reduces the count by one if one of the coalescing 

allophones is the only allophone of its phoneme. For example, let the 
one-allophone phoneme be /a/; then the coalescent product [a^^bj^J must 
belong to /b/--the indeterminacy of type (1) is not encountered. A 
\ compensated coalescence increases the phoneme-count by one if both 
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coalescing allophones belong to phonemes that have other allophones, 
but It leaves the count unchanged if one of the coalescing allo^)hones 
is the only allophone of its phoneme— for though a phoneme is lost 
by merger with another, another phoneme is split in two. 

3.5. The effect of coalescence on total load can be summarized 
by saying that the total load cannot be increased. An uncompensated 
coalescence may reduce it. A compensated coalescence cannot increase 
It: the new contrast between [c^^] and [c^] can make exactly the same 
contribution made before the change by [a^] and [b^], but not more. 

The load loss is zero, for a compensated coalescence, if and only if the 
environments and E^, involving respectively [a^^] and [b^^], account 
for all the occurrences of at least one of those two allophones. 

This takes care of the only possibly nonobvious part of the proof 
of the following 

— ^ sad [y] are any two allophones. not necessarily 
d istinct , and /x/ and /y/ are the phonemes, not necessarily distinct . 

^ o which [x] and [y] respective ly belong , then H([x],[y]) ^ H(/x/,/y/). 

4. CASE 3— COMPONENTS 

4.1. We shall now speak of a system L I— I whose elements |l|, 

|2|,..., [t] are the characters of a componential alphabet . Each 
character is a "simultaneous bundle" (formally, merely an unordered set) 

®2’**** SijJ more distinct components , of which there is 

a finite stock F = {£- , £«»***»c }; that is, in any character csach 

< . < i ^ 

e^, 1 = i = n, is one or another of the c^ of the stock F. At least 
some of the components, we assume, occur in more than one of the charac- 
ters. From one character to another, n can vary, but we assume that at 
least one character contains more than one component « Two characters 
are distinct if and only if one contains at least one component missing 
from the other. The components are pairwise distinguishable; therefore 
so are the characters. Every utterance of the language of which is 

the phonological system consists, without residue, of a string of occurrences 
of bundles of components; but not every set of components constitutes a 
bundle, and not every string of occurrences of bundles is necessarily 
an utterance of the language. 
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Tradltlonal phonemic theory attempted to set forth and to exploit 
the redundanc]^ of any natural language by grouping allophones Into 
units called "phonemes" on the basis of complementary distribution 
and phonetic simllarltyo The purposes of this operation are no longer 
entirely clear to me, except for the obvious but somewhat extraneous 
vVr. alm-of. achieving a simple yet accurate notation. -^The Issue was confused 
by a desire to achieve, at the same time, as simple and nearly Invariant 
as possible a notation for elements of a very different kind (morphemes). 

The componentlal approach does not manipulate simultaneous bundles 
the way traditional phonemic theory manipulated allophones. Instead, 
limitations of distribution and co-occurrence are discovered and dealt 
with directly In terms of the components themselves, environments being 
simultaneous as well as successive. A regularly occurring and regularly 
observable feature of articulation or sound Is not necessarily recognized 
as a component idierever It occurs; some or all of Its occurrences may 
turn out to be predictable from the occurrences and arrangements of 
other features, provided the latter are formally recognized as components. 
Different practitioners of our craft go about this sort of analytic 
operation In different ways; there Is little consensus as to the logic. 
What Is even more troublesome, there are wild disagreements among analysts 
as to what they are willing to admit they hear In the utterances of one 
and the same language- -even their own native language. 

While It would be wrong to pass over these controversies In complete 
silence, we can stop now. Given care on one point, our formalism, as 
set forth In the first paragraph of this section (4.1.), stands ready 
to meet the empirical demands of whatever version of the componentlal 
approach emerges victorious. 

The one point Is that It would be awkward to have to talk about 
a component coalescing with nothlng--that Is, disappearing or appearing. 
Suppose we Interpret English |p| as containing all the components present 
In jbj, plus "voicelessness'". That Is, we "zero out" the voicing of |b|, 
which Is not at all to deny that jb| Is voiced, but to choose to regard 
voicing simply as what one his except just when the "voicelessness" 
component Is present. 



o 

-ERIC 



21 - 



A merger of |p| and [b| in some environment would then be difficult to 
describe within our formalism. The way to avoid this trouble is 
very simple. For other purposes we *'zero out** as merrily as we please; 
but for the investigation of functional load (and perhaps, in general, 
for the discussion of linguistic change) we do not, English |p | and |b| 
may share a number of components, but each must be recognized as having 
a component missing from the other: |p|has voicelessness but not voicing; 
jb|has voicing but not voicelessness. 

As a consequence, we must recognize a larger stock of components 
than may be necessary for certain other purposes. What is more, at 
least some components come in small sets that are mutually exclusive 
in occurrence, in that if one of such a set is present in a bundle, 
none of the others of the same set can be, A bundle cannot be both 
voiceless and voiced;, in most languages, at least, it cannot be bilabial 
and at the same time apico-alveolar or dorso-velar. This is all common- 
place, and makes no trouble. Indeed, for the present discussion we 
can now forget about it, 

4.2. With L- defined as in Sec. 3.1. , and H as in Sec. 2.2., it 
turns out that H(L— ) is the entropy of a phonological system handled 
in terms of components and bundles, and that H(|a|,|b|) is the load 
carried by a particular pair of bundles of components. 

Now, is there any simple relation between H(lS) and H(L^-l)» 
assuming that we are dealing with the same language? Assuredly there 
should be. In fact, the two ought to be equal, since we hardly want 
the -entropy of a phonological system to depend on the theoretical 
preferences of the analyst. What we can actually assert, however, is 
only as follows. Suppose a given language has ^ “ r distinct bundles 
of components, and that just these r = ^ bundles are taken as the 
allophones of the system by someone who analyzes by the methods of 
traditional phonemics. From Theorem 1 (Sec. 3,3.) we know that 
H(L - ) a H(l 2), But clearly |H(L^-^) « H(l[;^*), since under the 
stated circumstances ■ l!~^ . Hence, under just these conditions 

we know that H(L^ = H(l!*^), 
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This conclusion is a sort of Bill of Rights for the phonemicist. 

Of course, anyone can make mistakes of hearing or recording , and thus 
vitiate his results, but that is not the sort of thing we can deal 
with here. Setting this aside, our conclusion means that the pho- 
nemicist, as long as he operates within the constraints of assumptions 
A1 and A2 of Sec. 3.2, and takes simultaneous bundles of components 
as his allophones, can tinker with his data in any way he wishes, 
for any purpose he seeks (such as an elegant linear notation), with 
no fear that he is throwing information away. 

4.3. The componential approach permits us to measure the functional 
load carried by a contrast between two components, either in a 
specific environment or in all environments, instead of only that 
carried by two whole bundles (or allophones:). 

When we want to measure the load carried by the contrast between 
two components in a specific environment, it is usually because there 
are two bundles |al and |b| , which differ only in that |a| contains component 
“1 not while |b| contains but not snd both of which occur 
in the same environment E (of preceding and/or following bundles). In 
this case, we define a "derived” system L' as identical with 
except that in L* |a| and|bjhave, in environment E (but not elsewhere), 
coalesced into a single bundle |ab|. In this particular environment, 
then, either e^^ has been replaced by or has been replaced by 
or both and have been replaced by some coalescent component 
from both. (It does not matter which of these is the 
case.) Then the very specific functional load being sou^t is H(i!H - 

H(L*). 

We might seek to determine the functional load carried, not by 
the contrast between single components in a given environment, but 
rather by the contrast between tw© sets of components in that environ- 
ment, neither set necessarily being large enough to constitute a idiole 
character. Let us say that |a| and |b| are the same except that|a| 
contains » ^2* * * * ’•^} where |b| contains > we are 

to understand that, for 1 « i S ^ and ^ cannot both be present 
in the same character. The definition and procedure are, of course^i 
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exactly analogous to the preceding case. This would be the appro- 
priate technique for determining the importance of the distinction 
of medial consonants in English matter and madder , and the like, 
present in British English but lost, as we noted earlier, from most 
American English dialects. 

To measure the total load carried by a contrast between two 
components, we derive f rom a system L' in which the two components 

have coalesced in all environments, but nothing else has happened. 

Any pair of bundles which differ (in ) only in that one contains 
one of the components idille the other has the other is thus coalesced 
into a single bundle in L* . Then the desired load is HcJ-b - H(L*)* 
This would be the appropriate procedure, for example, for determining 
the importance of voicelessness versus voicing in English. 

5. DISCUSSION 

We have shown how functional load can be quantified in any of 
three different frames of reference: phonemic, allophonlc, or com- 
ponential. And we have described three interrelated measures, one 
based on H, the entropy in blnits, one based on fi, the entropy in 
shannons, and one based on C, the relative entropy. 

A very small amount of empirical work has been devoted to the 
determination of the redundancy R of languages (usually in written 
rather than spoken form).^^ The redundancy is defined as 1 - C. 

If, in some system, every string of characters is a message, then 
the relative entropy is unity and the redundancy is zero. 



10 

Some estimates for written English are given in C. E. Shannon, 
"The Mathematical Theory of Communication," in C. E. Shannon and W. 
Weaver, The Mathematical Theory of Communication , Urbana, 1949; 
see also Claude E. Shannon, "Prediction and Entropy of Printed English" 
Bell System Technical Journal . Vol. 30, pp. 50-64, 1951. I know of 
no printed data on spoken English, but a decade ago I attempted some 
determinations using phonemic transcription rather than standard 
orthography (and using test audiences familiar with the transcription) ; 
the results pointed towards a figure approximately the same (.50) as 
that for orthographic English. Clearly, little confidence should be 
placed in that figure; further empirical study is a desideratum. 
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In such a system, any change in a message between transmitter and 
receiver, brought about by channel noise, is an uncorrectable error* 
Since channel noise cannot be completely eliminated, redundancy plays 
a useful role* The empirical work referred to above suggests that 
the redundancy of natural languages hovers in the vicinity of *50* 

We shall use this figure Instead of a completely arbitrary symbol, 
but pending further empirical study it should be taken as purely 
tentative* 

One may now propose that, in the long run, the phonological system 
of the language of any community is governed by a law something like 
that that controls the behavior of a harmonic oscillator: if ^ deviates 
from its "neutral" value of *50, then there is a "restoring force" 

§ (.50 - where K is a constant that presses C back towards 

the "neutral" value* The greater the displacement from neutral, 
the greater the restoring force* If $ is positive, the redundancy is 
low and the relative entropy high: § tends to increase the former and 
decrease the latter* If $ is negative, the redundancy is high and 
the relative entropy low; § tends to decrease the former and increase 
the latter* 

This is perhaps more metaphor than mathematics, but let us see 
how it might work* Suppose, first, that the redundancy has become 
too low* Utterances are misunderstood oftener than usual* ihnbiguous 
phrasings are therefore replaced or paraphrased by less ambiguous ones* 
For example, at that stage in the history of English when "Let himi" 
could be understood as a request either to leave him alone or to stop 
him, people began saying something like "Stop himi" if that was what 
they wanted done*^^ Also, people come to articulate more carefully* 



Leonard Bloomfield, Language, New York, 1933, p* 398* It is 
not implied, of course, that at that period in the history of English 
the overall redundancy had become too loiir* Indeed, perhaps it never 
does because perhaps; adjustments in specific Instances, such as the 
one cited in the text, are made too quickly for there to be any 
measurable diminution of redundancy for the whole language* This has 
to do with the magnitude of the constant K., discussed in the last 
paragraph of the paper* 
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But this really means the same thing, since typically a given sentence 
said rapidly and carelessly and the "same" sentence said slowly and 
carefully are not phonologically identical--the slow careful form retains 
stigmata of identity that are discarded in the rapid form. In general, 
then, utterances come to be distinguished from one another by larger 
numbers of occurrences of phonological units. This decreases the 
entropy and the relative entropy, and increases the redundancy. 

Suppose, next, that the redundancy has become unnecessarily high. 

On the average, speakers are doing more work than necessary for intelli- 

r> 

gibility. Through laziness, "least effort," or whatever principle is 
actually involved here--clearly some principle of this sort is a reality- 
articulation becomes less careful. In such rapid careless speech, 
phonological units that are articulatorily similar can easily coalesce; 
and if there is little resort to slow-speech alternatives, then the fuller 
phonological structure of the slow-speech forms can be forgotten. In 
general, then, utterances come to be distinguished from one another by 
fewer occurrences of phonological units. This increases the entropy 
and the relative entropy, and decreases the redundancy. 

Our "force" §, then, is actually the vector sum of two forces: 
one, which we might as well call "laziness," presses towards lower 
redundancy; the other, which is the practical need to be understood, 
presses towards lower relative entropy. Of course, both of these 
forces are statistical averages over whole communities of people and 
over many varied circumstances in which speech takes place— except in 
this gross statistical sense, we are not asserting that "people are 
naturally as lazy as they can be" or anything of the sort. At any one 
period, in any one community, the two forces have to operate via the 
actual linguistic system of the community, as it has been inherited, 
with all its arbitrary conventions. One could not venture, merely through 
the recognition of our two conflicting forces, to predict in any detail 
the near future of the language of any community. Even if one had 
considerably detailed information about the arbitrary conventions of the 
linguistic system, predictability would be severely limited, since so 
many different sorts of changes could equally well throw the two forces 
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out of balance, and so many different specific adjustments could restore 
the balance. 

To complete the metaphor (if that is what it is), we may ask if 
anything might be empirically determinable about the constant K • 

IfKis very small, then momentary deviations from balance — that is, 
from R = C = ,50— might well be rather large. If JC is large, then 
deviations are going to be small, and the restoration of balance is 
going to be more rapid. It may even be that K is an arbitrary constant, 
different from one language— or, perhaps, from one way of life— to 
another. It might even be that the balance point, which we have taken 
as .50, is different, say, between neolithic Polynesians and industrialized 
European-Americans, We have no information on these matters, but I see 
no reason why it could not be obtained if we want to obtain it. 




